A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees

نویسندگان

  • Jean-Francis Roy
  • Mario Marchand
  • François Laviolette
چکیده

The C-bound, introduced in Lacasse et al. [2006], gives a tight upper bound on the risk of the majority vote classifier. Laviolette et al. [2011] designed a learning algorithm named MinCq that outputs a dense distribution on a finite set of base classifiers by minimizing the C-bound, together with a PACBayesian generalization guarantee. In this work, we design a column generation algorithm that we call CqBoost, that optimizes the C-bound and outputs a sparse distribution on a possibly infinite set of voters. We also propose a PAC-Bayesian bound for CqBoost that holds for finite and two cases of continuous sets of base classifiers. Finally, we compare the accuracy and the sparsity of CqBoost with MinCq and other state-of-the-art boosting algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

We formulate weighted graph clustering as a prediction problem1: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian ...

متن کامل

PAC-Bayesian Bounds for Discrete Density Estimation and Co-clustering Analysis

We applied PAC-Bayesian framework to derive generalization bounds for co-clustering. The analysis yielded regularization terms that were absent in the preceding formulations of this task. The bounds suggested that co-clustering should optimize a trade-off between its empirical performance and the mutual information that the cluster variables preserve on row and column indices. Proper regulariza...

متن کامل

A PAC-Bayesian Approach to Minimum Perplexity Language Modeling

Despite the overwhelming use of statistical language models in speech recognition, machine translation, and several other domains, few high probability guarantees exist on their generalization error. In this paper, we bound the test set perplexity of two popular language models – the n-gram model and class-based n-grams – using PAC-Bayesian theorems for unsupervised learning. We extend the boun...

متن کامل

PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering

We derive a PAC-Bayesian generalization bound for density estimation. Similar to the PAC-Bayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KL-divergence of the posterior from the prior. Moreover, the PACBayesian generalization bound for classification can be derived as a special case of the bound for ...

متن کامل

PAC-Bayesian Theory Meets Bayesian Inference

We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam’s razor criteria, under the assumption that the data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016